Lexicon Optimization for WFST-Based Speech Recognition Using Acoustic Distance Based Confusability Measure and G2P Conversion
نویسندگان
چکیده
In this paper, we propose a lexicon optimization method based on a confusability measure (CM) to develop a large vocabulary continuous speech recognition (LVCSR) system with unseen words. When a lexicon is built or expanded for unseen words by using grapheme-to-phoneme (G2P) conversion, the lexicon size increases since G2P is generally realized by 1-to-N-best mapping. Thus, the proposed method attempts to prune the confusable words in the lexicon by a CM defined as an acoustic model distance between two phonemic sequences. It is demonstrated by LVCSR experiments that the proposed lexicon optimization method achieves a relative word error rate (WER) reduction of 14.72% in a Wall Street Journal task compared to the 1-to-4-best G2P converted lexicon approach.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملComparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary
Grapheme-to-Phoneme (G2P) conversion is the task of predicting the pronunciation of a word given its graphemic or written form. It is a highly important part of both automatic speech recognition (ASR) and text-to-speech (TTS) systems. In this paper, we evaluate seven G2P conversion approaches: Adaptive Regularization of Weight Vectors (AROW) based structured learning (S-AROW), Conditional Rando...
متن کاملCombining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation
In a recent work, we proposed an acoustic data-driven grapheme-to-phoneme (G2P) conversion approach, where the probabilistic relationship between graphemes and phonemes learned through acoustic data is used along with the orthographic transcription of words to infer the phoneme sequence. In this paper, we extend our studies to under-resourced lexicon development problem. More precisely, given a...
متن کاملA Qualitative Evaluation of Phoneme-to-Phoneme Technology
Automatic speech recognition systems apply grapheme-to phoneme transcription (G2P) to model pronunciation of items in the lexicon. General purpose G2P transcriptions are not always accurate, e.g., in a multilingual environment. To improve the transcription quality, G2P transcriptions can be postprocessed using a phoneme-to-phoneme (P2P) converter. This paper discusses the applicability of P2P t...
متن کاملAcoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework
One of the primary steps in building automatic speech recognition (ASR) as well as text-to-speech systems is development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this proces...
متن کامل